Importing data

Data Structure

The original dataset bullying_network.xlsx is an Excel file that contains two sheets about bullying within a group of 25 students in an Italian school class: the first one encloses three attributes per student; the second one represents the adjacency matrix of a directed network depicting who bullies whom among the students.

The first sheet, the attributes one, contains:

  • gender, mapped in females (0) and males (1);
  • grade, which goes from 0 to 10;
  • ethnicity, divided into three categories: first-generation immigrants (born abroad with foreign parents), second-generation immigrants (born in Italy with foreign parents) and Italians (born in Italy with at least one Italian parent).

Data pre-processing

As forehead described, the original Excel file contains two sheets. To retrieve information there stored, a first approach exploited the read_excel(path, sheet) function and saved the two sheets in different variables. Since this attempt implied a network misreading of 0s and 1s as characters instead of double and creating an undirected graph, we opted for another importing action: the original Excel file has been separated manually into two different csv files:

  • bullying_attributes.csv: containing the attributes of students inside the network;
  • bullying_network.csv: containing the actual adjacency matrix (i.e. the network).

Libraries

library(tidyverse)
library(sna)
library(knitr)
library(itsadug) # For gradient legend
library(formattable) # For nice tables
library(igraph)
library(visNetwork)

Data import

Import network data

With the following code, the network is imported, converted into matrix and then into directed network.

# Importing data
net = as.matrix(read.csv("bullying_network.csv",
                         stringsAsFactors=FALSE, row.names=1))
# Converting the adjacency matrix to a network
net = as.network(net, directed=T)
# Inspect the network
summary(net, print.adj = FALSE)
## Network attributes:
##   vertices = 25
##   directed = TRUE
##   hyper = FALSE
##   loops = FALSE
##   multiple = FALSE
##   bipartite = FALSE
##  total edges = 76 
##    missing edges = 0 
##    non-missing edges = 76 
##  density = 0.1266667 
## 
## Vertex attributes:
##   vertex.names:
##    character valued attribute
##    25 valid vertex names
## 
## No edge attributes

By inspecting the obtained network through summary(net), we may notice:

  • the number of vertices (i.e., nodes), which are 25 students;
  • that the network is directed;
  • that no hyperedges, loops or multiple edges are allowed;
  • that there are overall 76 directed edges;
  • that the density is 0.127;
  • that nodes are labeled with names, namely the students’ first names.

Import attributes

Students attributes have been saved in a distinct file and, therefore, imported separately.

# Reading the csv file with node attributes with given types for summary
attr = read.csv("bullying_attributes.csv", 
                colClasses = c("character","factor","numeric","factor"))
summary(attr)
##       ID            Gender     Grade       Ethnicity
##  Length:25          0:12   Min.   : 3.00   1: 5     
##  Class :character   1:13   1st Qu.: 5.00   2: 6     
##  Mode  :character          Median : 6.00   3:14     
##                            Mean   : 6.16            
##                            3rd Qu.: 8.00            
##                            Max.   :10.00

As can be seen from the summary, there are more males than females, but this distinction is not significant since the class has an odd number of students. Instead, ethnicity registers a more significant difference, showing 11 immigrants and 14 Italians. The minimum grade is 3, although marks start typically from 0, while the maximum is 10. Both median and mean of the class are around 6, which corresponds to the sufficiency in the Italian system.

# Re-importing as numerical for the following plots and measures
attr = read.csv("bullying_attributes.csv")

Drawing a network

Basic Draw

par(mar=c(0,0,0,0)) # Margin deletion
gplot(net,
      gmode="digraph",    # directed network
      mode="fruchtermanreingold", # set mode as Fruchterman-Reingold
      jitter=F,           # do not allow nodes to be "jittered"
      edge.col="grey70",  # set color of ties
      vertex.col="cornflowerblue",   # set color of nodes
      vertex.sides = 100,
      displaylabels=T,    # indicate that labels should be included
      label.pos=1,        # indicate that labels should be given below points
      label.cex=.7,       # indicate the size of the labels (1 is default)
      arrowhead.cex = 0.4)

The Fruchterman-Reingold layout is the default mode of the gplot() function, and therefore it is not necessary to include it. Overall there are 25 labelled nodes connected through directed links representing who bullies whom. In particular, there is a student, Roberto, that does not present incoming or outgoing edges and, following the definition of Wasserman & Faust (1994), it is an isolate1. Therefore, Roberto is the only student in this specific school class that is not bullied and does not bully either.

Attribute-based network

Once again, let’s consider the three attributes and their respective values:

  • Gender: binomial variable 0/1;
  • Ethnicity: categorical variable assuming three different values (1, 2 or 3);
  • Grade: categorical variable assuming 11 different values (from 0 to 10).

These three attributes are all categorical data and, specifically, can be divided into nominal (gender and ethnicity) and ordinal (grade). This distinction implies a further difference in the visualisation. Whereas the nominal values can be mapped using variation in colour hue or shape, the ordinal ones may lead to different conclusions. Below two versions are displayed, using a combination of shape, colour and size. The first one variates the datum size (area) and fits data based on proportional scales where the larger sizes mean greater quantities (e.g., grade 10). The second option exploits the colours’ lightness. Thus, lightness variation can represent quantitative scales, where the darker the colour, the greater the quantity2.

First version

A first version considers the variations in:

  • size as grade;
  • shape as gender;
  • colour as ethnicity.
colours_1 = c("#2a9d8f", "#e9c46a", "#e76f51")
sizes = seq(0.1, 4, length.out=11)

par(mar=c(0,0,0,0))
gplot(net,
      gmode="digraph",    # directed network
      jitter=T,
      edge.col="grey70",  # set color of ties
      vertex.col=colours_1[attr$Ethnicity], # set color of nodes depending on ethnicity
      vertex.cex = sizes[attr$Grade], # set size of nodes depending on grade
      vertex.sides= (attr$Gender-1)*47+4, # set shape of nodes depending on gender
            vertex.rot = 45, # Nodes rotation
      arrowhead.cex = 0.5) # Size of the arrows head

In this representation, both gender and ethnicity are evident, while the size does not accurately represent the grade. Moreover, to visualize the grade, a size vector proportionally maps the node’s size to its grade. Although this combination satisfies the assigned requirements, the user might experience difficulties distinguishing the grades differences.

In the following chunk, a legend is added to guide the user in the reading of the network:

  • ethnicity, through hue colour, represents 1st and 2nd generation of immigrants and Italians;
  • shapes map the gender of students: circles for males and squares for females;
  • grades vary in size, whose proportion is reported in the legend. Notice that the legend’s proportion does not precisely match the sizes displayed in the network.
sizes = seq(0.2, 3, length.out=11)

par(mar=c(0,0,0,0)) # Margin deletion
gplot(net,
      gmode="digraph",    # directed network
      jitter=T,           
      edge.col="grey70",  # set color of ties
      vertex.col=colours_1[attr$Ethnicity],   # set color of nodes depending on ethnicity 
      vertex.cex = sizes[attr$Grade], # set size of nodes depending on grade
      vertex.sides= (attr$Gender-1)*47+4, # set shape of nodes depending on gender  
      vertex.rot = 45,
      arrowhead.cex = 0.5,
      displaylabels=T,    # indicate that labels should be included
      label.pos=1,        # indicate that labels should be given below points
      label.cex=.7)       # indicate the size of the labels (1 is default)

legend("topleft",
       legend = c("Ethnicity"," 1st gen immigrants", " 2st gen immigrants", " Italian",
                  " ", "Gender", " Male", " Female",
                  " ", "Grade", 0:10),
       col = c("white",colours_1[1],colours_1[2],colours_1[3],
               "white","white","black","black",
               "white", "white", rep("black",11)), 
       bty = "n", 
       pch =c(15,15,15,15,
              15,15,19,15,
              15,15,rep(19,11)),
       pt.cex = c(0,2.1,2.1, 2.1,
                  0,0,2.1,2.1,
                  0, 0, (0:10)/5), 
       cex = 1, 
       text.col = "grey10", 
       horiz = F , 
       inset = c(0.01))

Second version

As stated above, the second approach represents the attributes in a different way:

  • shape: ethnicity (i.e., first-generation as triangles, second-generation as squares and Italians as circles);
  • colour hue: gender (i.e., females as red, males as green);
  • colour lightness: grade (the darker the higher, the lighter the lower).
colours_2 = c('#fff0f3', '#ffccd5', '#ffb3c1', '#ff8fa3', '#ff758f', # Reds
              '#ff4d6d', '#c9184a', '#a4133c', '#800f2f', '#590d22', # Reds
              "#d8f3dc","#b7e4c7","#95d5b2","#74c69d","#52b788",     # Greens
              "#40916c", "#2d6a4f","#245741","#1b4332","#081c15"     # Greens
)

shapes = c(3,4,50)

par(mar=c(0,0,0,0)) # Margin deletion
gplot(net,
      gmode="digraph",    # directed network
      jitter=F,           # do not allow nodes to be "jittered"
      edge.col="grey70",  # set color of ties
      vertex.col=colours_2[attr$Gender*10+attr$Grade+1],   # set color of nodes
      vertex.cex = 1.3,
      vertex.rot = 45,
      vertex.sides= shapes[attr$Ethnicity],
      arrowhead.cex = 0.5)

Despite ethnicity losing evidence, the grade variable acquires a central role in this visualisation. Thus, the use of colour lightness to highlight the grades (i.e. higher grades with darker colours) emphasizes the differences more than in the previous case where size was employed.

Notice that the exercise also required the use of size. However, applying size variation to gender or ethnicity may unconsciously suggest a social superiority of one of the two categories, which does not exist. Thus, this visualisation does not consider the size to avoid any possible misunderstanding.

In this case, the legend shows:

  • ethnicity as shapes;
  • gender as one of the two colours;
  • grade as a gradient, varying between males and females, whose lightness relies on the students’ grade.

This solution may help understand the grade distribution among students, but getting the exact grade is still challenging.

par(mar=c(0,0,0,0)) # Margin deletion
gplot(net,
      gmode="digraph",    # directed network
      jitter=T,           # do not allow nodes to be "jittered"
      edge.col="grey70",  # set color of ties
      vertex.col=colours_2[attr$Gender*10+attr$Grade+1],   # set color of nodes
      vertex.cex = 1,
      vertex.rot = 45,
      vertex.sides= shapes[attr$Ethnicity],
      displaylabels=T,    # indicate that labels should be included
      arrowhead.cex = 0.5,
      label.pos=1,        # indicate that labels should be given below points
      label.cex=.7)       # indicate the size of the labels (1 is default))

# Legend for gender and ethnicity
legend("topleft",
       legend = c("Ethnicity"," 1st gen immigrants", " 2st gen immigrants", " Italian",
                  " ", "Gender", " Male", " Female",
                  " ", "Grade"),
       col = c("white","black","black", "black",
               "white","white","#52b788","#ff4d6d",
               "white", "white"), 
       bty = "n", 
       pch =c(15,17,15,19,
              15,15,15,15,
              15, 15),
       pt.cex = c(0,2.1,2.1, 2.2,
                  0,0,2.1,2.1,
                  0,0), 
       cex = 1, 
       text.col = "grey10", 
       horiz = F , 
       inset = c(0.01))

# Gradient legend
pal = colorRampPalette(colours_2[1:10])
gradientLegend(valRange=c(0,10), 
               color = pal(10),
               pos=c(0.05,0.1,.08,0.5), 
               side=4, 
               n.seg=1,
               border.col = "grey10",
               inside = T)

pal = colorRampPalette(colours_2[11:20])
gradientLegend(valRange=c(0,10), 
               color = pal(10),
               pos=c(0.12,0.1,.15,0.5), side=4,
               n.seg=1,
               border.col = "grey10",
               inside = T)

Third interactive version

This third version exploits the package visNetwork to build an interactive visualization of the network, reproducing the first version forehead proposed, where:

  • shapes indicate the gender (i.e. females as circles, males as squares);
  • colour hue indicates the ethnicity, as shown in the legend;
  • the size is handled based on the grade of the single student.

Moreover, ethnicity can be chosen through the selection above the plot.

# Creation of the igraph network
i_net<-graph_from_adjacency_matrix(as.matrix(read.csv("bullying_network.csv",
                                                      row.names = 1, 
                                                      stringsAsFactors = F)),
                                   mode="directed", diag=F)

# Setting colours, shapes, sizes, ethnicity, gender, title
V(i_net)$color = colours_1[attr$Ethnicity]
V(i_net)$shape = c("dot","square")[attr$Gender+1]
V(i_net)$size = attr$Grade*3
V(i_net)$ethnicity = ifelse(attr$Ethnicity==3, "Italian",
                            ifelse(attr$Ethnicity == 1,"1st immigrant", "2nd immigrant"))
V(i_net)$gender = ifelse(attr$Gender == 0, "Female","Male")
V(i_net)$title = paste0("<p><b>", attr$ID,"</b><br>",
                        V(i_net)$gender,", ",attr$Grade,", ",
                        V(i_net)$ethnicity,"</p>")

# Dataframe for the legend
lnodes <- data.frame(label = c("1st immigrants","2nd immigrants","Italians","Females", "Males"),
                     shape = c(rep("triangle",3),"dot","square"), 
                     color = c(colours_1,"#B78FB5","#B78FB5"),
                     font.size = 10,
                     title = "Informations") 

# Creation of the visNetwork object
new_net = toVisNetworkData(i_net)

# Actual visualisation
visNetwork(nodes = new_net$nodes,
           edges = new_net$edges,
           height = "800px", width = "100%",
           font.size = 9,
           physics = T) %>%
    visEdges(arrows = "to") %>%
    visOptions(highlightNearest = TRUE, 
               selectedBy = "ethnicity") %>%
    visIgraphLayout(layout = "layout.fruchterman.reingold") %>%
    visLegend(addNodes = lnodes, useGroups = FALSE, ncol = 1)

The interactive version offers better exploitation and comprehension of the network through the possibility of visualising all the information related to a node by just clicking on it. By introducing a further layer, the problem of grades definition and comparison is solved.

Centrality

Degree centrality is defined as the number of links incident upon a node. Having a directed network implies splitting the degree into in-degree and out-degree. The former counts the number of incoming edges to a node, while the latter refers to the number of edges a node directs to others3.

When edges express positive aspects (e.g., friendship, collaboration), in-degree can be interpreted as a form of popularity and out-degree as gregariousness. However, a reversed scenario is presented here due to the edges’ negative connotation, i.e. bullying.

In-degree

Let’s first consider the in-degree centrality of nodes with the function degree(network, gmode, cmode="indegree"), which will return the in-degree for each student. The same function can be used to compute the out-degree, setting cmode="outdegree".

# How many people are you bullied from?
indegree = sna::degree(net, gmode="digraph", cmode = "indegree")

# How many people do you bully? 
outdegree = sna::degree(net, gmode="digraph", cmode = "outdegree")

# Overall Degree
degree = sna::degree(net, gmode="digraph")

attr$indegree = indegree
attr$outdegree = outdegree

To properly present the degree centrality measures, a dataframe has been created with students names, in-degree, out-degree and total degree (i.e. the sum of both in and out-degree). In addition, the normalized in-degree is computed by dividing the in-degree over the number of nodes minus \(1\) (i.e. \(n-1\)). The last column indicates whether the single node in-degree is below (or above) the average.

Name In degree Out degree Degree Normalized In degree Above Average (In degree)
Sara 9 2 11 0.38
Gianna 8 1 9 0.33
Marco 8 1 9 0.33
Andrea 7 2 9 0.29
Mario 7 2 9 0.29
Valentina 6 0 6 0.25
Giulia 4 4 8 0.17
Laura 3 0 3 0.12
Sofia 3 2 5 0.12
Angela 2 7 9 0.08
Angelo 2 6 8 0.08
Francesco 2 7 9 0.08
Giovanni 2 3 5 0.08
Giuseppe 2 2 4 0.08
Maria 2 1 3 0.08
Rosa 2 5 7 0.08
Stella 2 6 8 0.08
Anna 1 6 7 0.04
Aurora 1 7 8 0.04
Flavio 1 1 2 0.04
Luigi 1 4 5 0.04
Stefano 1 1 2 0.04
Alessandro 0 5 5 0.00
Luca 0 1 1 0.00
Roberto 0 0 0 0.00

By considering the in-degree centrality, Sara can be defined as the most central node, followed by Gianna and Marco. As stated above, whereas having many incoming edges represents an advantageous condition for the individual in several cases, this specific network emphasizes an unfavourable status. Thus, in-degree measure refers to the number of bullies a node must face. Consequently, Sara results the chosen victim and gets bullied by \(9\) people, which corresponds to the \(0.375\%\) of the network (normalization obtained by \(\frac{in-degree}{n-1} = \frac{9}{24} = 0.375\), with \(n\) number of nodes).

In the worst-case scenario, Sara would be bullied by \(24\) students, and she would be the only victim. However, despite her seeming far from the depicted scenario, her in-degree looks exceptionally high considering the average and the median in-degree. The same happens for students like Gianna, Marco, Andrea, Mario, Valentina and Giulia.

It is also worth noticing that Sara’s condition seems not to be determined by her ethnicity or gender, but by her grade. While at least one member per ethnicity and/or per gender has been bullied, the reason for which Sara is bullied by several students might be the achievement of the highest mark: \(10\).

Measures Value
Most central node Sara
Max in degree 9
(Theoretical) max in degree 24
Least central nodes Alessandro, Luca, Roberto
Min in degree 0
(Theoretical) min in degree 0
Average in degree 3.04
Median in degree 2

As the bar plot below shows, the average is strongly impacted by the unbalanced situation within the network, meaning a group of highly bullied victims against the remaining nodes assessing low in-degree values. The extreme cases (\(in-degree = 0\)) are further analysed in the following sections and might reflect either isolate (i.e. Roberto) or bully figures.

Note that the following network nodes size indicates whether the in-degree is above average (bigger) or not (smaller), exploiting the interactive version proposed in section Third interactive version.

Outdegree

There is no unique central node in the case of the out-degree measure. Thus, Angela, Aurora, and Francesco can be detected as the \(3\) most central nodes (as shown in the table below). It is also worth noticing that their degree measure is assessed at \(9\) and \(8\) and, therefore, they are not the extreme cases hypothesized above (i.e., \(in-degree=0\)). As already performed in the in-degree case, some reference points are computed and reported below.

Name Out degree In degree Degree Normalized Out degree Above Average (Out degree)
Angela 7 2 9 0.292
Aurora 7 1 8 0.292
Francesco 7 2 9 0.292
Angelo 6 2 8 0.250
Anna 6 1 7 0.250
Stella 6 2 8 0.250
Alessandro 5 0 5 0.208
Rosa 5 2 7 0.208
Giulia 4 4 8 0.167
Luigi 4 1 5 0.167
Giovanni 3 2 5 0.125
Andrea 2 7 9 0.083
Giuseppe 2 2 4 0.083
Mario 2 7 9 0.083
Sara 2 9 11 0.083
Sofia 2 3 5 0.083
Flavio 1 1 2 0.042
Gianna 1 8 9 0.042
Luca 1 0 1 0.042
Marco 1 8 9 0.042
Maria 1 2 3 0.042
Stefano 1 1 2 0.042
Laura 0 3 3 0.000
Roberto 0 0 0 0.000
Valentina 0 6 6 0.000
Measures Value
Most central node Angela, Aurora, Francesco
Max out degree 7
(Theoretical) max out degree 24
Least central nodes Laura, Roberto, Valentina
Min out degree 0
(Theoretical) min out degree 0
Average out degree 3.04
Median out degree 2

Each of the three central out-degree nodes bullies almost one-third of the overall network:

\[ \left[ \frac{outdegree}{n-1} = \frac{7}{24} = 0.291 \approx 0.29 \approx \frac{1}{3}\right] \]

These values are highly above the average of 3.04 \(\pm\) 2.423 but do not reach extreme cases, such as in a completely centralized network (i.e. maximum out-degree). Moreover, a visual interpretation of the out-degree distribution in a bar plot graph may help understand the underlying reasons for the low reference points.

Once again, the out-degree measure shows an imbalanced situation assessing a significant gap between the lowest extreme cases (e.g. \(0\)) and the highest ones (i.e.,\(7\)). Overall, \(15\) nodes have an out-degree below average, while the remaining \(10\) nodes score slightly above average or double it in the most central nodes.

Note that the following network nodes size indicates whether the out-degree is above average (bigger) or not (smaller), exploiting the interactive version proposed in section Third interactive version.

Correlations

Grade and in-degree

grade = attr$Grade
cor_value = cor(indegree, grade, method = c("pearson"))
cor_value
## [1] 0.852993

The Pearson coefficient on in-degree and grade, namely the correlation coefficient, is slightly above 0.853. Therefore, a strong and positive linear relationship can be detected between being bullied (i.e., in-degree) and the school results. The number of bullies of a kid linearly depends on their grades. Thus, better marks correspond to a higher probability of being bullied.

cor.test(indegree,grade)
## 
##  Pearson's product-moment correlation
## 
## data:  indegree and grade
## t = 7.838, df = 23, p-value = 6.082e-08
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.6906372 0.9334951
## sample estimates:
##      cor 
## 0.852993

Moreover, the correlation test shows a low p-value, indicating a statistically significant relationship between the number of people that bully and the grade they get, illustrated by the following scatter plot. In particular, each data point is described in terms of grade (y-axis) and in-degree (x-axis). The linear relationship between these variables is reported along with its \(95%\) confidence interval.

However, despite the high correlation coefficient, it is worth noticing some fallacies in the model:

  1. Low number of students: since our sample has a small number of subjects, this model is not representative for the entire school system bullying mechanism;
  2. The line fits the majority of points. Nevertheless, some outliers can be detected from the plot (e.g. point at coordinates \(p=(3,9)\)).

For confirming the relationship between grade and being bullied, the ANOVA test is applied, showing a low p-value and therefore a discriminant factor for the studied network. In particular, the pairwise test displays pairs of grades where there is a substantial difference in terms of being bullied. For instance, \(8\) and \(3\) grades tend to be inside the mechanism of bullying, while \(4\) and \(3\) don’t.

res.aov <- aov(indegree ~ as.factor(Grade), data = attr)
summary(res.aov)
##                  Df Sum Sq Mean Sq F value   Pr(>F)    
## as.factor(Grade)  7 157.44  22.491   14.98 3.87e-06 ***
## Residuals        17  25.52   1.501                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
TukeyHSD(res.aov)
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = indegree ~ as.factor(Grade), data = attr)
## 
## $`as.factor(Grade)`
##            diff        lwr       upr     p adj
## 4-3  -0.5000000 -4.7091325  3.709132 0.9998699
## 5-3   1.3333333 -2.1034089  4.770076 0.8739953
## 6-3   1.6428571 -1.7319568  5.017671 0.7035655
## 7-3   0.5000000 -4.6551134  5.655113 0.9999667
## 8-3   6.1666667  2.3242720 10.009061 0.0007910
## 9-3   5.8333333  1.9909387  9.675728 0.0014273
## 10-3  8.5000000  3.3448866 13.655113 0.0005882
## 5-4   1.8333333 -1.6034089  5.270076 0.6088329
## 6-4   2.1428571 -1.2319568  5.517671 0.4078165
## 7-4   1.0000000 -4.1551134  6.155113 0.9969257
## 8-4   6.6666667  2.8242720 10.509061 0.0003312
## 9-4   6.3333333  2.4909387 10.175728 0.0005905
## 10-4  9.0000000  3.8448866 14.155113 0.0003084
## 6-5   0.3095238 -2.0322213  2.651269 0.9997359
## 7-5  -0.8333333 -5.3797160  3.713049 0.9978363
## 8-5   4.8333333  1.8570272  7.809639 0.0006956
## 9-5   4.5000000  1.5236939  7.476306 0.0014893
## 10-5  7.1666667  2.6202840 11.713049 0.0009599
## 7-6  -1.1428571 -5.6426090  3.356895 0.9848795
## 8-6   4.5238095  1.6192322  7.428387 0.0010913
## 9-6   4.1904762  1.2858988  7.095054 0.0023970
## 10-6  6.8571429  2.3573910 11.356895 0.0013723
## 8-7   5.6666667  0.8063791 10.526954 0.0162923
## 9-7   5.3333333  0.4730458 10.193621 0.0260442
## 10-7  8.0000000  2.0473878 13.952612 0.0047484
## 9-8  -0.3333333 -3.7700756  3.103409 0.9999667
## 10-8  2.3333333 -2.5269542  7.193621 0.7167896
## 10-9  2.6666667 -2.1936209  7.526954 0.5775206

Instead, by focusing on the grade and bullying (i.e. out-degree), the p-value results high, leading to the conclusion that the relationship is not statistically significant.

res.aov <- aov(outdegree ~ as.factor(Grade), data = attr)
summary(res.aov)
##                  Df Sum Sq Mean Sq F value Pr(>F)
## as.factor(Grade)  7  60.58   8.654    1.83  0.146
## Residuals        17  80.38   4.728

Gender

This section hosts t-tests which try to identify differences in terms of mean of two different groups: males and females. These tests were led on in-degree and out-degree. However, no significant p-value was obtained from these tests, meaning that gender is not discriminant in the bullying behaviour inside the class.

# Mean difference between males and females (in degree)
t.test(filter(attr,Gender==0)$indegree,
       filter(attr,Gender==1)$indegree)
## 
##  Welch Two Sample t-test
## 
## data:  filter(attr, Gender == 0)$indegree and filter(attr, Gender == 1)$indegree
## t = 0.94555, df = 22.988, p-value = 0.3542
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -1.241156  3.330900
## sample estimates:
## mean of x mean of y 
##  3.583333  2.538462
# Mean difference between males and females (out degree)
t.test(filter(attr,Gender==0)$outdegree,
       filter(attr,Gender==1)$outdegree)
## 
##  Welch Two Sample t-test
## 
## data:  filter(attr, Gender == 0)$outdegree and filter(attr, Gender == 1)$outdegree
## t = 0.73287, df = 21.115, p-value = 0.4717
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -1.330422  2.779140
## sample estimates:
## mean of x mean of y 
##  3.416667  2.692308

Ethnicity

For completeness, some tests are run for ethnicity too. Since there are three ethnic groups among students, ANOVA tests investigate pairwise differences. As TukeyHSD() shows, the p-values are not significant and therefore ethnicity is not discriminant for bullying or being bullied.

# Being bullied and ethnicity
res.aov <- aov(indegree ~ as.factor(Ethnicity), data = attr)
summary(res.aov)
##                      Df Sum Sq Mean Sq F value Pr(>F)
## as.factor(Ethnicity)  2   0.45   0.223   0.027  0.974
## Residuals            22 182.51   8.296
TukeyHSD(res.aov)
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = indegree ~ as.factor(Ethnicity), data = attr)
## 
## $`as.factor(Ethnicity)`
##          diff       lwr      upr     p adj
## 2-1 0.2000000 -4.181304 4.581304 0.9927790
## 3-1 0.3428571 -3.426744 4.112459 0.9716690
## 3-2 0.1428571 -3.387698 3.673412 0.9943215
# Bullying and ethnicity
res.aov <- aov(outdegree ~ as.factor(Ethnicity), data = attr)
summary(res.aov)
##                      Df Sum Sq Mean Sq F value Pr(>F)
## as.factor(Ethnicity)  2   2.43   1.213   0.193  0.826
## Residuals            22 138.53   6.297
TukeyHSD(res.aov)
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = outdegree ~ as.factor(Ethnicity), data = attr)
## 
## $`as.factor(Ethnicity)`
##           diff       lwr      upr     p adj
## 2-1 -0.9333333 -4.750420 2.883754 0.8139202
## 3-1 -0.6000000 -3.884159 2.684159 0.8909550
## 3-2  0.3333333 -2.742563 3.409230 0.9600440

Permutation-based approach

A permutation-based test (also called re-randomization test) is a statistical significance test in which the distribution of the test under the null hypothesis is obtained by calculating all possible values of the test statistic under all possible rearrangements of the observed data points4.

This test is performed on \(1000\) permutations of the grade over students’ in-degree while computing the Pearson correlation coefficient between them. As the histogram shows, after all the permutations, a Gaussian distribution is reached within the range of values of Pearson coefficient (i.e. \([-1,1]\)), since it is likely to find no correlations between random permutations of grade and in-degree than perfect ones. Moreover, as indicated by the straight vertical line, the Pearson coefficient \(\rho\) indicates that, after \(1000\) permutations, the correlation is more statistically significant than by chance.

set.seed(21)

permutation = matrix(NA,1000,1)
for (k in c(1:1000))
{
  grade_perm = sample(grade)
  permutation[k,1] = cor(indegree, grade_perm)
}

# Histogram
hist(permutation, prob=TRUE, 
     xlim=c(-1.0,1.0), ylim = c(0.0, 2.0), 
     xlab = "Value of permutation", col = "#C7A8C5")

x <- seq(min(permutation), max(permutation))

# Gaussian curve
curve(dnorm(x, mean=mean(permutation), sd=sd(permutation)), 
      col="#7C5079", lwd=2, add=TRUE)
text(x = 0.56, y = cor(indegree, grade), 
     expression(paste("N~(", mu, "= 0.0, ", sigma, "= 0.2)")), col = "#3E283D")

# P-value line
abline(v=cor(indegree, grade), lwd=2, col="#7C5079")
text(x = cor(indegree, grade)+0.1, y = cor(indegree, grade),
     expression(paste(rho, ": 0.85")), col = "#7C5079")

As the histogram above shows, after all the permutations, a Gaussian distribution is reached within the range of values of Pearson coefficient (i.e. \([-1,1]\)), whose mean is around \(0\). This happens because it is more likely to find no correlation (i.e. \(0\)) between random permutations of grade and in-degree than perfect ones (i.e. \(-1,1\)). Moreover, as indicated by the straight vertical line at \(x = 0.85\), the Pearson coefficient of the initial sample is far away from the average correlation obtained by permuting data.

pnorm(cor_value, mean = mean(permutation), sd = sd(permutation), lower.tail = T)
## [1] 0.9999727

The function pnorm() assesses that the correlation coefficient of the original sample is above the \(99%\) of the coefficients computed by randomly permuting data. Hence, obtaining a Pearson coefficient $= $0.853 would be possible in less than \(1\%\) of the cases and it is therefore statistically significant.

Centralization and reciprocity

Centralization

Centralization is a question about how central some nodes are compared to other nodes, and it is a characteristic of the network itself. It is related to the distribution of the centrality inside the network5, while centrality is more related to the individual sphere of the nodes6.

In an undirected network, centralization is computed as follows:

\[ Centralization = \frac{\sum (\max-degree_i)}{(n-1)(n-2)} \] The most centralized network is the one that has a single node in the centre, which links every other node in the network. In this case, the maximum degree is \((n-1)\) and the centralization index returns \(1\):

\[ C_{max} = \frac{(24-1) \cdot 24+(24-24)\cdot1}{24\cdot23} = \frac{23\cdot24}{23\cdot24} = 1 \]

In the opposite case, namely in a decentralized network, every node is linked to everybody else and the centralization index is \(0\):

\[ C_{min} = \frac{(24-24)\cdot25}{24\cdot23} = \frac{0}{23\cdot24} = 0 \]

# Total degree centralization
sna::centralization(net,sna::degree)
## [1] 0.111413

In the bullying network, the centralization index is 0.111, which indicates a reasonably decentralized network.

Since the bullying network is directed, it is possible to split the centralization index into in-degree and out-degree. As the table below shows, the in-degree centralization index is higher than the out-degree one, indicating a more centralized bullied network (i.e. few kids are bullied from many) and a less centralized bullying network (i.e. nearly everybody bullies others).

Notice that both indexes are higher than the degree centralization index (i.e. 0.111), suggesting the presence of two more centralized sub-networks.

table = formattable(data.frame(
    "Centralization index" = round(sna::centralization(net,sna::degree),3),
    # In degree centralization
    "In degree Centralization Index" = round(sna::centralization(net,sna::degree, cmode="indegree"),3),
    # Out degree centralization
    "Out degree Centralization Index" = round(sna::centralization(net,sna::degree, cmode="outdegree"),3)))

names(table) = c("Centralization Index", "In degree Centralization Index", "Out degree Centralization Index")
table
Centralization Index In degree Centralization Index Out degree Centralization Index
0.111 0.259 0.172

Moreover, considering the two hypothetical extremes, meaning the maximum (i.e. \(1\)) and minimum (i.e. \(0\)) centralization indexes, it can be noticed that the computed ones are closer to the minimal reference point than to the maximal. It is worth noticing that the sub-networks (i.e. bullying and being bullied) are decentralized due to multiple central nodes.

Reciprocity

Before computing the reciprocity index, a census about dyads is performed to understand the state-of-art, meaning how many mutual, asymmetric and null dyads are present.

sna::dyad.census(net)
##      Mut Asym Null
## [1,]   4   68  228

Given \(25\) nodes, the maximal number of edges is \((25 \cdot 24)/2 = 300\), while the minimal is \(0\). In this specific network, there are:

  • 68 asymmetrical edges;
  • 228 null or missing edges;
  • 4 mutual edges: Francesco - Andrea, Flavio - Giulia, Angelo - Giulia and Marco - Sara.
# Mutual edges
E(i_net)[which_mutual(i_net)]
## + 8/76 edges from 16fec93 (vertex names):
## [1] Andrea   ->Francesco Angelo   ->Giulia    Flavio   ->Giulia   
## [4] Francesco->Andrea    Giulia   ->Angelo    Giulia   ->Flavio   
## [7] Marco    ->Sara      Sara     ->Marco

By looking at the table below with the nodes involved in the mutual edges, we can notice that:

  • Flavio and Marco bully just one person (i.e. Giulia and Sara respectively), but their victims, in turn, bully them, creating a cycle of attack and defense;
  • Giulia appears twice in these mutual edges, which represent half of her in and out-edges. This might suggest a counterattack technique: responding to the bullies by bullying them.
Name In degree Out degree Degree Normalized In degree Normalized Out degree
Andrea 7 2 9 0.29 0.083
Angelo 2 6 8 0.08 0.250
Flavio 1 1 2 0.04 0.042
Francesco 2 7 9 0.08 0.292
Giulia 4 4 8 0.17 0.167
Marco 8 1 9 0.33 0.042
Sara 9 2 11 0.38 0.083

As a further step, the reciprocity index is computed as the number of mutual dyads (i.e. \(M\)) in the network over the total amount of edges, namely the sum of mutual (\(M\)) and asymmetric edges (\(A\)):

\[ Reciprocity = \frac{2\cdot M}{2\cdot M+A} \]

sna::grecip(net, measure="edgewise")
##       Mut 
## 0.1052632

The computation above yields a reciprocity rate of 0.105, meaning that almost \(11 \%\) of the network drives to mutual relations between nodes (i.e., counterattack technique).

\[ Reciprocity = \frac{2\cdot 4}{2\cdot 4+68} = \frac{8}{76} = 0.105 \approx 11\% \]

Therefore, the arc-based reciprocity (also called edgewise or tie reciprocity) suggests a lack of mutual edges compared to the asymmetric ones, but not their absence. This statement should also be contextualized considering some reference points, such as the minimum and maximum possible reciprocity values. The former refers to the absence of reciprocity, namely a network with only asymmetric (or null) edges, and assesses a reciprocity rate of \(0\):

\[ R_{min} = \frac{2\cdot M}{2\cdot M+A} = \frac{2\cdot 0}{2\cdot 0+A}=0 \]

The latter instead assumes that all edges within the network are mutual. Therefore, the highest reachable value would be \(1\):

\[ R_{max} = \frac{2\cdot M}{2 \cdot M+A} = \frac{2\cdot M}{2\cdot M}=1 \] Instead, considering the dyadic measure for reciprocity, the proportion of mutual and null edges over the total, namely the sum of mutual \(M\), asymmetric \(A\) and null edges \(N\), yields an index of 0.773.

\[ R_{\text{dyadic}} = \frac{M+N}{M+N+A} \]

sna::grecip(net, measure="dyadic")
##       Mut 
## 0.7733333

Thanks to a different interpretation of the dyads, the obtained reciprocity value is considerably higher than before. Thus, the initial assumption relies on the definition of null edge as a symmetric non-relation, implying an absence of bullying acts from both sides. The census results highlight how the null edges represent the major type of edges (i.e., \(228\) over \(300\)), making possible to understand the high reciprocity rate.

A clear demonstration of null edges’ strong influence can be driven by computing the ratio of mutuals to non-null dyads.

\[ R_{\text{dyadic non null}} = \frac{M}{M+A} \]

The index falls to 0.056 and indicates that over the \(5\%\) of the non-null dyads is mutual.

sna::grecip(net, measure="dyadic.nonnull")
##        Mut 
## 0.05555556

  1. Wasserman, S., & Faust, K. (1994). Social network analysis: Methods and applications↩︎

  2. Kirk, A. (2012). Data visualisation: a successful design process. Packt publishing LTD.↩︎

  3. Freeman, L. C. (1978), Centrality in social networks conceptual clarification. Social networks, 1(3), 215-239.↩︎

  4. Pitman, E. J. G. (1937). Significance Tests Which May be Applied to Samples From any Populations. Supplement to the Journal of the Royal Statistical Society, 4(1), 119–130.↩︎

  5. Freeman, L. C. (1978). Centrality in social networks conceptual clarification. Social networks, 1(3), 215-239.↩︎

  6. Everett, M. G., & Borgatti, S. P. (1999). The centrality of groups and classes. The Journal of mathematical sociology, 23(3), 181-201.↩︎